Textual Analysis in Python (for DHers, etc).

Part Two: Natural Language Toolkit (NLTK)

A lesson for AL340 Digital Humanities Seminar (Spring 2015)

--> Double Click here to start<--

Welcome!

This is an IPython Notebook. It contains numerous cells that can be of different types (Markdown, Code, Headings). This is lesson two. You may need to install the Natural Language Toolkit to begin.

Your turn

Select the cell above this one by double clicking it with your mouse.

You can see that it contains text in a format called Markdown. To execute the cell, press shift+enter. To complete this tutorial (which is meant for a classroom), execute the cells and follow the instructions.

What is NTLK?

NLTK (Natural Language Toolkit) is a software for doing natural language processing (NLP). It has quite a number of features, and it is easy to learn.

We will just work with a few features in this lesson in order to learn some of the possibilities for computational analysis of text.

Getting Started

First, we will import nltk.



In [1]:

    
import nltk



In [ ]:

    
#nltk; collocations; concordancing; frequency distribution; dispersion plot; pos tagging